WIP: Peer Discovery DEP #7

pfrazee · 2018-02-07T02:09:34Z

This one is almost ready to go. Needs review and a few TODO items which should be easy to resolve.

TODOs

add a section about hole punching
add a section about the discovery key
fix possible amplification attack?
Some things Maf and I plan to change in the implementation in a debug session soon:
- fix encoding to include all 32 bytes (not the 20 currently)
- switch to blake2b in token gen
- rethink how push works to make sure we notify (almost) everyone, or possibly switch to pull only
- consider a new hole-punching system that replaces push announcements

mafintosh · 2018-02-08T10:53:48Z

proposals/0000-peer-discovery.md

+
+Multicast DNS (mDNS) resolves host names to IP addresses within small networks without a local name server. It is a zero-configuration service, using essentially the same interfaces, packet formats and operating semantics as unicast DNS. The mDNS protocol is published as [RFC 6762](https://tools.ietf.org/html/rfc6762) and is built on multicast UDP.
+
+Dat treats Hypercore public keys as domain names on the mDNS protocol. Therefore, peer discovery is an IP lookup for a given public key name. Currently the public key is encoded to hex and truncated to 40 bytes. The domain name format used is:


It is not the public key, but the hypercore discovery key. This is an important detail. We never expose the public key over the network, as it is used as a capability

It's truncated because dns subdomains can at most be 63 chars also

mafintosh · 2018-02-08T10:54:24Z

proposals/0000-peer-discovery.md

+Peer listings are a base64-encoded buffer of 6-byte peer items. Each peer item is packed as follows:
+
+```
+{4 bytes: IPv4 address}{2 bytes: port}


I think it's big endian port

mafintosh · 2018-02-08T10:55:14Z

proposals/0000-peer-discovery.md

+To verify the addresses of clients, the DNS discovery protocol uses a session token exchange. All clients must first request a token before sending protocol messages. The server will generate the token using the following algorithm:
+
+```
+sha256(secret + client-address)


personal note that we should use blake2b here instead

mafintosh · 2018-02-08T10:55:51Z

proposals/0000-peer-discovery.md

+
+The token is requested by sending a `TXT` record to the DNS server with a target name of `"dat.local"`. The server will respond with the token, plus the port and address of the sending device (which are useful as a "whoami").
+
+Over time, the server will rotate the secret it uses to generate tokens. In order to update clients' tokens, every response includes the latest token. The client should update its token with every response it receives. (It's advised that the server keeps the most recently expired secre so that old tokens can be accepted and replaced smoothly.)


secre -> secret

mafintosh · 2018-02-08T10:56:20Z

proposals/0000-peer-discovery.md

+{PUBKEY}.dat.local
+```
+
+Dat uses the `TXT` record type. A query is submitted as a simple `TXT` query for `{PUBKEY}.dat.local`. The response provides a peer-listing which will only include the local node, if it is actively hosting the requested Hypercore.


{DISCOVERY_KEY}.dat.local

mafintosh · 2018-02-08T10:57:09Z

proposals/0000-peer-discovery.md

+### Lookup query
+[dns-name-server-lookup-query]: #dns-name-server-lookup-query
+
+To request the current list of known peers for a pubkey, send a `TXT` question query with `{PUBKEY}.dat.local` as the name. Currently the public key is encoded to hex and truncated to 40 bytes. You will receive a response that includes a full peer listing and the latest token. See "TXT data encoding" above for information about encoding.


{DISCOVERY_KEY}.dat.local

mafintosh · 2018-02-08T10:57:47Z

proposals/0000-peer-discovery.md

+
+If a `TXT` lookup query is sent with an "additional" section that does not have the `subscribe` flag, that is treated as an "unsubscribe" message and the device is removed from the active listeners.
+
+TODO- what's the TTL?


Good question, I think its 1-2min

mafintosh · 2018-02-08T10:58:27Z

proposals/0000-peer-discovery.md

+
+The `announce` field instructs the DNS name server to add the device to the list of active hosting peers for the given Hypercore. Its value should be the port from which the device is listening. Multiple ports may be announced using separate queries. Upon announce, the new peer is pushed to any subscribed devices using an `SRV` query.
+
+TODO- how long till announce records expire? Should the client reannounce periodically?


Yes they should. They are GC'ed every 5-10 mins

mafintosh · 2018-02-08T10:59:08Z

proposals/0000-peer-discovery.md

+
+Mainline DHT is the name given to the Kademlia-based Distributed Hash Table (DHT) used by BitTorrent clients to find peers. Dat has adopted it temporarily to track peers in its own network. You can find the specification at [BEP 0005](http://www.bittorrent.org/beps/bep_0005.html).
+
+There are some issues with Dat's use of Mainline which limit the usefulness of its function. BitTorrent uses a 20 byte sha1 hash to identify torrents, while Dat uses a 32 byte public key to identify Hypercore registers. As a result, Dat has to truncate its keys to the first 20 bytes, leading to false positives when connecting to peers.


discovery key

mafintosh · 2018-02-08T11:00:16Z

proposals/0000-peer-discovery.md

+# Privacy concerns
+[privacy-concerns]: #privacy-concerns
+
+Peer discovery networks reveal the participants in a Dat swarm to any device which can access the network. This presents a privacy risk for users who may not want to have their activity broadcasted.


main thing leaked is who is talking to who (which is of course important). we never leak the capability (public key) so passive listeners cannot access data / decrypt data - they can also see Alice and Bob are talking to each other probably

it's important to indicate who that information is leaked to. elsewhere in the documentation (e.g. in the security FAQ) we are lead to believe that information is leaked only to the members of the swarm, which is not really accurate. sure, the contents are visible only to the members of the swarm, but metadata like public (and private?) IP addresses and relationships between people are spread out much more widely that I first believed when reviewing the protocol.

in particular, if i understand this DEP correctly, it implies that discoverN.datproject.org know precisely:

when a peer comes online (when Alice runs dat share)

when a peer looks for content (when Bob runs dat clone $ALICEHASH)

that Alice and Bob are related

This raises all sorts of privacy concerns which should be answered by the dat project. For example:

does the discovery server keep logs?

what is the retention policy?

who has access to those logs?

I think the current section about Privacy concerns is great, but should be expanded to cover for this peculiar property of the protocol. The security FAQ should also be updated to mention this, but that's a separate issue: I've documented my concerns with that in dat-ecosystem-archive/docs#127

mafintosh · 2018-02-08T11:00:51Z

proposals/0000-peer-discovery.md

+# Unresolved questions
+[unresolved]: #unresolved-questions
+
+ - Does the DNS network *need* to truncate the public key to 40 bytes? Could we fit the full 64 bytes by using another level of subdomain?


yea good idea. 32-chars.32-chars or use an encoding other than hex that is still dns friendly. Thoughts?

You could hash the discovery key to something like 63 bytes?

If we hash to 63 bytes (not change encoding) we're basically just losing a byte of specificity. Why not just do 32.32 and stick with hex? We could also switch to base32.

32.32 seems reasonable.

bnewbold

Good start! I think the main thing is pubkey/discovery key confusion.

I might have more detailed comments after I implement more of these mechanisms; right now geniza just does passive lookups.

EDIT: whoops, looks like these comments might be redundant with maf's; I made these asynchronously and just got back to an internet connection now.

bnewbold · 2018-02-09T01:35:34Z

proposals/0000-peer-discovery.md

+
+An important aspect of Dat's networking is peer discovery, the techniques that peers use to find each other. Peer discovery means finding the IP and port of data sources online that have a copy of that data you are looking for. You can then connect to them and begin exchanging data. By using peer discovery techniques Dat is able to create a network where data can be discovered even if the original data source disappears.
+
+Peer discovery can happen over many kinds of networks, as long as you can model the following actions:


I'm not sure how I feel about encoding these function signatures here. I did something similar in the hyperdb DEP, but here it feels really specific to the existing dat implementation. Maybe just describe what semantics are necessary for lookup, and what additional semantics are necessary to announce/cancel membership or subscribe to a feed of peer updates?

bnewbold · 2018-02-09T01:35:39Z

proposals/0000-peer-discovery.md

+
+Multicast DNS (mDNS) resolves host names to IP addresses within small networks without a local name server. It is a zero-configuration service, using essentially the same interfaces, packet formats and operating semantics as unicast DNS. The mDNS protocol is published as [RFC 6762](https://tools.ietf.org/html/rfc6762) and is built on multicast UDP.
+
+Dat treats Hypercore public keys as domain names on the mDNS protocol. Therefore, peer discovery is an IP lookup for a given public key name. Currently the public key is encoded to hex and truncated to 40 bytes. The domain name format used is:


I'm pretty sure we use discovery keys (not public keys) for all discovery mechanisms.

The discovery key is a BLAKE2b "keyed hash" of the string "hypercore" using the public key (32 bytes), described in the wire protocol DEP (WIP).

bnewbold · 2018-02-09T01:35:59Z

proposals/0000-peer-discovery.md

+Over time, the server will rotate the secret it uses to generate tokens. In order to update clients' tokens, every response includes the latest token. The client should update its token with every response it receives. (It's advised that the server keeps the most recently expired secre so that old tokens can be accepted and replaced smoothly.)
+
+
+### Lookup query


SRV requests are also possible. Eg, on the command line:

dig @discovery1.publicbits.org 905fd1b6504698425e8bec3dbb77d757e281d505.dat.local SRV

returns something like:

0 0 44113 172.19.0.4.

Which, IIRC, is port 44113 on host 172.19.0.4 (note the trailing period, which is not a typo).

bnewbold · 2018-02-09T01:36:04Z

proposals/0000-peer-discovery.md

+
+The `announce` field instructs the DNS name server to add the device to the list of active hosting peers for the given Hypercore. Its value should be the port from which the device is listening. Multiple ports may be announced using separate queries. Upon announce, the new peer is pushed to any subscribed devices using an `SRV` query.
+
+TODO- how long till announce records expire? Should the client reannounce periodically?


I think 10 minutes, and clients should re-announce, but I don't have a reference.

bnewbold · 2018-02-09T01:36:10Z

proposals/0000-peer-discovery.md

+There are some issues with Dat's use of Mainline which limit the usefulness of its function. BitTorrent uses a 20 byte sha1 hash to identify torrents, while Dat uses a 32 byte public key to identify Hypercore registers. As a result, Dat has to truncate its keys to the first 20 bytes, leading to false positives when connecting to peers.
+
+
+# Privacy concerns


Here are two other privacy concerns off the top of my head:

the concern of being able to discovery (and potentially download) all content "discovered" via these mechanisms. This is mitigated by using discovery keys (instead of public keys) for download

the ability to discover who has what content on the network (if you know a priori what content is associated with which discovery keys). Eg, imagine somebody sharing leaked documents; if the documents (dat archive, and thus discovery key) become public, somebody can make a list of all peers who have exchanged (or "knew of") that archive. I don't know any mitigation for this right now. This is also a concern with the wire protocol; in that case it could be mitigated by encrypting the entire transaction (including the discovery key verification), but not with the current encryption scheme.

tristanls · 2018-02-10T04:36:30Z

proposals/0000-peer-discovery.md

+### Lookup query
+[dns-name-server-lookup-query]: #dns-name-server-lookup-query
+
+To request the current list of known peers for a pubkey, send a `TXT` question query with `{PUBKEY}.dat.local` as the name. Currently the public key is encoded to hex and truncated to 40 bytes. You will receive a response that includes a full peer listing and the latest token. See "TXT data encoding" above for information about encoding.


You will receive a response that includes a full peer listing

If I go through a valid "probe" step, acquire a session token, and then announce multiple ports, that would seem to increase the full peer listing arbitrarily.

Since simple lookups do not require a token, then it should be possible for me to spoof the IP address in simple lookups and use (previously constructed) arbitrarily large full peer listing to execute DoS on a target.

Good find, this is a weakness cc @mafintosh

pfrazee · 2018-02-15T19:16:56Z

Ok I covered most feedback (thanks all!). A few TODOs are written in the Unresolved questions. There's also some changes that we're going to implement in March, so I'm going to leave this DEP for now.

anarcat

The privacy concerns section should be expanded to cover discoveryN.datproject.org more explicitely, details in comment above.

anarcat · 2018-08-25T01:38:33Z

proposals/0000-peer-discovery.md

+# Privacy concerns
+[privacy-concerns]: #privacy-concerns
+
+Peer discovery networks reveal the participants in a Dat swarm to any device which can access the network. This presents a privacy risk for users who may not want to have their activity broadcasted.


it's important to indicate who that information is leaked to. elsewhere in the documentation (e.g. in the security FAQ) we are lead to believe that information is leaked only to the members of the swarm, which is not really accurate. sure, the contents are visible only to the members of the swarm, but metadata like public (and private?) IP addresses and relationships between people are spread out much more widely that I first believed when reviewing the protocol.

in particular, if i understand this DEP correctly, it implies that discoverN.datproject.org know precisely:

when a peer comes online (when Alice runs dat share)

when a peer looks for content (when Bob runs dat clone $ALICEHASH)

that Alice and Bob are related

This raises all sorts of privacy concerns which should be answered by the dat project. For example:

does the discovery server keep logs?

what is the retention policy?

who has access to those logs?

I think the current section about Privacy concerns is great, but should be expanded to cover for this peculiar property of the protocol. The security FAQ should also be updated to mention this, but that's a separate issue: I've documented my concerns with that in dat-ecosystem-archive/docs#127

Add proposals/0000-peer-discovery.md

264c94e

pfrazee changed the title ~~Add proposals/0000-peer-discovery.md (WIP)~~ Peer Discovery DEP (WIP) Feb 7, 2018

mafintosh reviewed Feb 8, 2018

View reviewed changes

bnewbold reviewed Feb 9, 2018

View reviewed changes

tristanls reviewed Feb 10, 2018

View reviewed changes

pfrazee added 6 commits February 15, 2018 13:04

Remove message signatures

29c5d51

Document discovery keys

fc66ffc

Document attack concern

2971ad8

Document port encoding

198c809

Add todo - use blake2b to generate dns session tokens

a6cd86c

Add TTL info

653e0cf

pfrazee changed the title ~~Peer Discovery DEP (WIP)~~ WIP: Peer Discovery DEP Mar 4, 2018

anarcat suggested changes Aug 25, 2018

View reviewed changes

anarcat mentioned this pull request Aug 25, 2018

document DNS discovery/registration procedures in the security section dat-ecosystem-archive/docs#127

Open

yoshuawuyts mentioned this pull request Oct 28, 2018

Implement peers / network datrs/hypercore#11

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Peer Discovery DEP #7

WIP: Peer Discovery DEP #7

pfrazee commented Feb 7, 2018 •

edited

Loading

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

mafintosh Feb 8, 2018

anarcat Aug 25, 2018 •

edited

Loading

mafintosh Feb 8, 2018

emilbayes Feb 8, 2018

mafintosh Feb 8, 2018

pfrazee Feb 8, 2018

anarcat Aug 25, 2018

bnewbold left a comment •

edited

Loading

bnewbold Feb 9, 2018

bnewbold Feb 9, 2018

bnewbold Feb 9, 2018

bnewbold Feb 9, 2018

bnewbold Feb 9, 2018

tristanls Feb 10, 2018 •

edited

Loading

pfrazee Feb 10, 2018

pfrazee commented Feb 15, 2018

anarcat left a comment

anarcat Aug 25, 2018 •

edited

Loading


		Multicast DNS (mDNS) resolves host names to IP addresses within small networks without a local name server. It is a zero-configuration service, using essentially the same interfaces, packet formats and operating semantics as unicast DNS. The mDNS protocol is published as [RFC 6762](https://tools.ietf.org/html/rfc6762) and is built on multicast UDP.

		Dat treats Hypercore public keys as domain names on the mDNS protocol. Therefore, peer discovery is an IP lookup for a given public key name. Currently the public key is encoded to hex and truncated to 40 bytes. The domain name format used is:


		The token is requested by sending a `TXT` record to the DNS server with a target name of `"dat.local"`. The server will respond with the token, plus the port and address of the sending device (which are useful as a "whoami").

		Over time, the server will rotate the secret it uses to generate tokens. In order to update clients' tokens, every response includes the latest token. The client should update its token with every response it receives. (It's advised that the server keeps the most recently expired secre so that old tokens can be accepted and replaced smoothly.)


		If a `TXT` lookup query is sent with an "additional" section that does not have the `subscribe` flag, that is treated as an "unsubscribe" message and the device is removed from the active listeners.

		TODO- what's the TTL?


		The `announce` field instructs the DNS name server to add the device to the list of active hosting peers for the given Hypercore. Its value should be the port from which the device is listening. Multiple ports may be announced using separate queries. Upon announce, the new peer is pushed to any subscribed devices using an `SRV` query.

		TODO- how long till announce records expire? Should the client reannounce periodically?


		Mainline DHT is the name given to the Kademlia-based Distributed Hash Table (DHT) used by BitTorrent clients to find peers. Dat has adopted it temporarily to track peers in its own network. You can find the specification at [BEP 0005](http://www.bittorrent.org/beps/bep_0005.html).

		There are some issues with Dat's use of Mainline which limit the usefulness of its function. BitTorrent uses a 20 byte sha1 hash to identify torrents, while Dat uses a 32 byte public key to identify Hypercore registers. As a result, Dat has to truncate its keys to the first 20 bytes, leading to false positives when connecting to peers.


		An important aspect of Dat's networking is peer discovery, the techniques that peers use to find each other. Peer discovery means finding the IP and port of data sources online that have a copy of that data you are looking for. You can then connect to them and begin exchanging data. By using peer discovery techniques Dat is able to create a network where data can be discovered even if the original data source disappears.

		Peer discovery can happen over many kinds of networks, as long as you can model the following actions:

		Over time, the server will rotate the secret it uses to generate tokens. In order to update clients' tokens, every response includes the latest token. The client should update its token with every response it receives. (It's advised that the server keeps the most recently expired secre so that old tokens can be accepted and replaced smoothly.)


		### Lookup query

		There are some issues with Dat's use of Mainline which limit the usefulness of its function. BitTorrent uses a 20 byte sha1 hash to identify torrents, while Dat uses a 32 byte public key to identify Hypercore registers. As a result, Dat has to truncate its keys to the first 20 bytes, leading to false positives when connecting to peers.


		# Privacy concerns

WIP: Peer Discovery DEP #7

Are you sure you want to change the base?

WIP: Peer Discovery DEP #7

Conversation

pfrazee commented Feb 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anarcat Aug 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnewbold left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tristanls Feb 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfrazee commented Feb 15, 2018

anarcat left a comment

Choose a reason for hiding this comment

anarcat Aug 25, 2018 • edited Loading

Choose a reason for hiding this comment

pfrazee commented Feb 7, 2018 •

edited

Loading

anarcat Aug 25, 2018 •

edited

Loading

bnewbold left a comment •

edited

Loading

tristanls Feb 10, 2018 •

edited

Loading

anarcat Aug 25, 2018 •

edited

Loading